Q: Use the lm() function to perform a simple linear regression with mpg as the response and horsepower as the predictor. Use the summary() function to print the results. Comment on the output. For example:
library(MASS)
library(ISLR)
auto <-lm(mpg~horsepower,data=Auto)
summary(auto)
##
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.5710 -3.2592 -0.3435 2.7630 16.9240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.935861 0.717499 55.66 <2e-16 ***
## horsepower -0.157845 0.006446 -24.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
i: The p-value for the horsepower variable is very small (<<0.05), hence horsepower seems to be associated with mpg. We reject the null hypothesis. Therefore, there is a relationship between the predictor and response.
ii: The R-square is 0.6059. Which is the strength of relationship.
iii: The relationship between response and predictor is negative indicated by the sign of the coefficient, , as we can see from the parameter estimate for horsepower (β1^): horsepower -0.157845
predict(auto,data.frame(horsepower=98),interval="confidence")
## fit lwr upr
## 1 24.46708 23.97308 24.96108
predict(auto,data.frame(horsepower=98),interval="prediction")
## fit lwr upr
## 1 24.46708 14.8094 34.12476
iv : Predicted value of mpg is 24.46708.Confidence interval is (23.97308 - 24.96108) and prediction interval is (14.8094 - 34.12476)
# plot(as.factor(auto$horsepower),as.factor(auto$mpg),col = "blue")
# abline(auto)
Q: Use the plot() function to produce diagnostic plots of the least squares regression fit. Comment on any problems you see with the fit.
par(mfrow=c(2,2))
plot(auto)
residual vs fitted value plot is not random but there is a u shape visible which suggests non linear relationship between predictor and response variable
This question involves the use of multiple linear regression on the Auto data set.
qualitative_columns <- c(2, 8, 9)
pairs(Auto) # almost too complicated to work with
b Compute the matrix of correlations between the variables using the function cor(). You will need to exclude the name variable, which is qualitative.
Autowithoutnames<-Auto
Autowithoutnames$name=NULL
cor(Autowithoutnames)
## mpg cylinders displacement horsepower weight
## mpg 1.0000000 -0.7776175 -0.8051269 -0.7784268 -0.8322442
## cylinders -0.7776175 1.0000000 0.9508233 0.8429834 0.8975273
## displacement -0.8051269 0.9508233 1.0000000 0.8972570 0.9329944
## horsepower -0.7784268 0.8429834 0.8972570 1.0000000 0.8645377
## weight -0.8322442 0.8975273 0.9329944 0.8645377 1.0000000
## acceleration 0.4233285 -0.5046834 -0.5438005 -0.6891955 -0.4168392
## year 0.5805410 -0.3456474 -0.3698552 -0.4163615 -0.3091199
## origin 0.5652088 -0.5689316 -0.6145351 -0.4551715 -0.5850054
## acceleration year origin
## mpg 0.4233285 0.5805410 0.5652088
## cylinders -0.5046834 -0.3456474 -0.5689316
## displacement -0.5438005 -0.3698552 -0.6145351
## horsepower -0.6891955 -0.4163615 -0.4551715
## weight -0.4168392 -0.3091199 -0.5850054
## acceleration 1.0000000 0.2903161 0.2127458
## year 0.2903161 1.0000000 0.1815277
## origin 0.2127458 0.1815277 1.0000000
c Use the lm() function to perform a multiple linear regression with mpg as the response and all other variables except name as the predictors. Use the summary() function to print the results. Comment on the output. For instance:
lm1 <-lm(mpg~ .-name,data=Auto)
summary(lm1)
##
## Call:
## lm(formula = mpg ~ . - name, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.5903 -2.1565 -0.1169 1.8690 13.0604
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.218435 4.644294 -3.707 0.00024 ***
## cylinders -0.493376 0.323282 -1.526 0.12780
## displacement 0.019896 0.007515 2.647 0.00844 **
## horsepower -0.016951 0.013787 -1.230 0.21963
## weight -0.006474 0.000652 -9.929 < 2e-16 ***
## acceleration 0.080576 0.098845 0.815 0.41548
## year 0.750773 0.050973 14.729 < 2e-16 ***
## origin 1.426141 0.278136 5.127 4.67e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.328 on 384 degrees of freedom
## Multiple R-squared: 0.8215, Adjusted R-squared: 0.8182
## F-statistic: 252.4 on 7 and 384 DF, p-value: < 2.2e-16
The p-value less than 0.05 for f statistic suggest that we can reject the null hypothesis and conclude there is atleast one variable which is significant in predicting mpg.
Using the coefficient p-values in the model output, and p = 0.05 as my threshold for significance, all variables except cylinders, horsepower & acceleration have a statistically significant relationship with the response.Displacement, weight, year and origin have statitically significant relationship with response based on lower p-values
The coefficient of year variable (0.750773) is significant and positive which suggests that if all other variables are constant than on average mpg increases by 0.75 every year.
d Use the plot() function to produce diagnostic plots of the linear regression fit. Comment on any problems you see with the fit. Do the residual plots suggest any unusually large outliers? Does the leverage plot identify any observations with unusually high leverage?
par(mfrow=c(2,2))
plot(lm1)
residuals vs fitted value plot shows u shape which suggests non linearity in the relationship
plot(predict(lm1),rstudent(lm1))
rstudentized residual vs fitted value plot suggests that there are certain observation for which the rstudentized residuals is >3 hence indicating outliers
plot(hatvalues(lm1))
which.max(hatvalues(lm1))
## 14
## 14
which.max gives the index of observation having highest leverage statistic
eUse the * and : symbols to fit linear regression models with interaction effects. Do any interactions appear to be statistically significant?
lm2<-lm(mpg~.:.,Autowithoutnames)
summary(lm2)
##
## Call:
## lm(formula = mpg ~ .:., data = Autowithoutnames)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.6303 -1.4481 0.0596 1.2739 11.1386
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.548e+01 5.314e+01 0.668 0.50475
## cylinders 6.989e+00 8.248e+00 0.847 0.39738
## displacement -4.785e-01 1.894e-01 -2.527 0.01192 *
## horsepower 5.034e-01 3.470e-01 1.451 0.14769
## weight 4.133e-03 1.759e-02 0.235 0.81442
## acceleration -5.859e+00 2.174e+00 -2.696 0.00735 **
## year 6.974e-01 6.097e-01 1.144 0.25340
## origin -2.090e+01 7.097e+00 -2.944 0.00345 **
## cylinders:displacement -3.383e-03 6.455e-03 -0.524 0.60051
## cylinders:horsepower 1.161e-02 2.420e-02 0.480 0.63157
## cylinders:weight 3.575e-04 8.955e-04 0.399 0.69000
## cylinders:acceleration 2.779e-01 1.664e-01 1.670 0.09584 .
## cylinders:year -1.741e-01 9.714e-02 -1.793 0.07389 .
## cylinders:origin 4.022e-01 4.926e-01 0.816 0.41482
## displacement:horsepower -8.491e-05 2.885e-04 -0.294 0.76867
## displacement:weight 2.472e-05 1.470e-05 1.682 0.09342 .
## displacement:acceleration -3.479e-03 3.342e-03 -1.041 0.29853
## displacement:year 5.934e-03 2.391e-03 2.482 0.01352 *
## displacement:origin 2.398e-02 1.947e-02 1.232 0.21875
## horsepower:weight -1.968e-05 2.924e-05 -0.673 0.50124
## horsepower:acceleration -7.213e-03 3.719e-03 -1.939 0.05325 .
## horsepower:year -5.838e-03 3.938e-03 -1.482 0.13916
## horsepower:origin 2.233e-03 2.930e-02 0.076 0.93931
## weight:acceleration 2.346e-04 2.289e-04 1.025 0.30596
## weight:year -2.245e-04 2.127e-04 -1.056 0.29182
## weight:origin -5.789e-04 1.591e-03 -0.364 0.71623
## acceleration:year 5.562e-02 2.558e-02 2.174 0.03033 *
## acceleration:origin 4.583e-01 1.567e-01 2.926 0.00365 **
## year:origin 1.393e-01 7.399e-02 1.882 0.06062 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.695 on 363 degrees of freedom
## Multiple R-squared: 0.8893, Adjusted R-squared: 0.8808
## F-statistic: 104.2 on 28 and 363 DF, p-value: < 2.2e-16
We can see the significant terms (at the 0.05 level) are those with at least one asterisk (*). It is probably unreasonable to use a significance level of 0.05 here, as we are testing such a large number of hypothesis, perhaps a lower threshold for significance (or a p-value correction using the p.adjust() function) would be appropriate.
Using the standard threshold of 0.05, the significant interaction terms are given by:
cylinders:acceleration acceleration:year acceleration:originEuropean acceleration:originJapanese year:originEuropean year:originJapanese
Adjusted R-square increased from 0.81 to 0.88 with addition of interaction terms.
anova(lm1,lm2)
Res.Df <dbl> | RSS <dbl> | Df <dbl> | Sum of Sq <dbl> | F <dbl> | Pr(>F) <dbl> | |
|---|---|---|---|---|---|---|
| 1 | 384 | 4252.213 | NA | NA | NA | NA |
| 2 | 363 | 2635.573 | 21 | 1616.64 | 10.60292 | 7.336624e-27 |
ANOVA quantifies the relationship by testing null hypothesis of the models being equal which we reject due to low p-value concluding the model 2 is different from 1
f Try a few different transformations of the variables, such as log(X), X−−√, X2. Comment on your findings.
lm3 <-lm(mpg~weight+I((weight)^2),Auto)
summary(lm3)
##
## Call:
## lm(formula = mpg ~ weight + I((weight)^2), data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.6246 -2.7134 -0.3485 1.8267 16.0866
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.226e+01 2.993e+00 20.800 < 2e-16 ***
## weight -1.850e-02 1.972e-03 -9.379 < 2e-16 ***
## I((weight)^2) 1.697e-06 3.059e-07 5.545 5.43e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.176 on 389 degrees of freedom
## Multiple R-squared: 0.7151, Adjusted R-squared: 0.7137
## F-statistic: 488.3 on 2 and 389 DF, p-value: < 2.2e-16
plot(lm3)
lm3_ <- lm(formula = mpg ~ . + I(horsepower^2) + I(year^2) + acceleration:year +
acceleration:origin, data = Auto)
summary(lm3_)
##
## Call:
## lm(formula = mpg ~ . + I(horsepower^2) + I(year^2) + acceleration:year +
## acceleration:origin, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.286 0.000 0.000 0.000 5.286
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.514e+02 1.266e+02 2.776 0.006832
## cylinders -2.835e-01 6.247e-01 -0.454 0.651179
## displacement -3.940e-02 1.789e-02 -2.202 0.030477
## horsepower -2.402e-01 7.106e-02 -3.381 0.001115
## weight 7.179e-04 1.570e-03 0.457 0.648721
## acceleration -2.726e+00 2.986e+00 -0.913 0.363963
## year -7.935e+00 3.386e+00 -2.344 0.021537
## origin -2.090e+00 5.045e+00 -0.414 0.679775
## nameamc ambassador dpl 8.747e-01 3.084e+00 0.284 0.777409
## nameamc ambassador sst 2.041e+00 3.019e+00 0.676 0.500857
## nameamc concord -1.163e+00 3.046e+00 -0.382 0.703563
## nameamc concord d/l -6.813e-01 3.119e+00 -0.218 0.827676
## nameamc concord dl 6 -1.244e+00 3.270e+00 -0.381 0.704540
## nameamc gremlin 1.927e-01 2.733e+00 0.071 0.943963
## nameamc hornet 7.467e-01 2.658e+00 0.281 0.779459
## nameamc hornet sportabout (sw) 4.468e-01 3.293e+00 0.136 0.892423
## nameamc matador -2.224e-01 2.438e+00 -0.091 0.927542
## nameamc matador (sw) 9.847e-01 2.677e+00 0.368 0.713925
## nameamc pacer -1.668e-01 3.225e+00 -0.052 0.958875
## nameamc pacer d/l -9.075e-02 3.241e+00 -0.028 0.977732
## nameamc rebel sst 1.583e+00 3.096e+00 0.511 0.610559
## nameamc spirit dl -1.816e+00 3.771e+00 -0.482 0.631441
## nameaudi 100 ls -1.082e+00 3.260e+00 -0.332 0.740817
## nameaudi 100ls -3.772e+00 2.760e+00 -1.367 0.175553
## nameaudi 4000 3.358e+00 2.986e+00 1.125 0.263980
## nameaudi 5000 -4.684e+00 3.189e+00 -1.469 0.145729
## nameaudi 5000s (diesel) 6.463e+00 3.410e+00 1.895 0.061630
## nameaudi fox 3.465e+00 3.104e+00 1.116 0.267589
## namebmw 2002 2.507e+00 3.316e+00 0.756 0.451956
## namebmw 320i -4.205e+00 3.103e+00 -1.355 0.179128
## namebuick century 7.487e-01 2.835e+00 0.264 0.792372
## namebuick century 350 9.698e-01 2.987e+00 0.325 0.746251
## namebuick century limited -6.651e-01 3.377e+00 -0.197 0.844360
## namebuick century luxus (sw) 1.017e+00 3.169e+00 0.321 0.749069
## namebuick century special -2.952e-02 3.161e+00 -0.009 0.992573
## namebuick electra 225 custom 1.018e-01 3.758e+00 0.027 0.978462
## namebuick estate wagon (sw) 2.503e+00 2.680e+00 0.934 0.352946
## namebuick lesabre custom 1.017e+00 3.123e+00 0.326 0.745615
## namebuick opel isuzu deluxe 2.240e+00 3.785e+00 0.592 0.555606
## namebuick regal sport coupe (turbo) -1.357e+00 3.423e+00 -0.396 0.692844
## namebuick skyhawk 2.487e+00 3.162e+00 0.787 0.433792
## namebuick skylark -3.814e-01 3.008e+00 -0.127 0.899416
## namebuick skylark 320 2.114e+00 3.039e+00 0.696 0.488627
## namebuick skylark limited 2.227e+00 3.602e+00 0.618 0.538101
## namecadillac eldorado 9.008e+00 3.247e+00 2.774 0.006867
## namecadillac seville 2.452e+00 3.155e+00 0.777 0.439249
## namecapri ii 2.515e-01 3.664e+00 0.069 0.945453
## namechevroelt chevelle malibu -2.713e-01 3.215e+00 -0.084 0.932953
## namechevrolet bel air 2.236e+00 3.058e+00 0.731 0.466807
## namechevrolet camaro -1.949e+00 3.679e+00 -0.530 0.597813
## namechevrolet caprice classic 1.013e+00 2.478e+00 0.409 0.683616
## namechevrolet cavalier -1.180e+00 3.858e+00 -0.306 0.760535
## namechevrolet cavalier 2-door 4.072e+00 3.787e+00 1.075 0.285384
## namechevrolet cavalier wagon -2.767e+00 3.818e+00 -0.725 0.470730
## namechevrolet chevelle concours (sw) -1.059e+00 3.128e+00 -0.339 0.735765
## namechevrolet chevelle malibu 1.151e+00 2.832e+00 0.407 0.685442
## namechevrolet chevelle malibu classic 1.936e-01 2.748e+00 0.070 0.944010
## namechevrolet chevette 1.054e+00 3.393e+00 0.311 0.756835
## namechevrolet citation 4.633e-01 2.985e+00 0.155 0.877026
## namechevrolet concours -8.392e-01 3.128e+00 -0.268 0.789145
## namechevrolet impala 8.130e-01 2.515e+00 0.323 0.747374
## namechevrolet malibu -6.864e-02 2.672e+00 -0.026 0.979566
## namechevrolet malibu classic (sw) 6.254e-01 3.097e+00 0.202 0.840473
## namechevrolet monte carlo 1.991e+00 3.496e+00 0.570 0.570524
## namechevrolet monte carlo landau 1.783e+00 2.613e+00 0.682 0.496972
## namechevrolet monte carlo s 2.420e+00 3.011e+00 0.804 0.423898
## namechevrolet monza 2+2 2.046e+00 3.151e+00 0.649 0.518083
## namechevrolet nova 2.971e-01 2.671e+00 0.111 0.911709
## namechevrolet nova custom 2.199e-01 3.193e+00 0.069 0.945265
## namechevrolet vega -6.727e-02 3.149e+00 -0.021 0.983010
## namechevrolet vega (sw) -1.435e-01 3.652e+00 -0.039 0.968757
## namechevrolet vega 2300 5.357e+00 3.633e+00 1.475 0.144209
## namechevrolet woody -8.297e-01 3.809e+00 -0.218 0.828085
## namechevy c10 -1.421e+00 2.997e+00 -0.474 0.636704
## namechevy c20 -2.509e+00 3.966e+00 -0.633 0.528772
## namechevy s-10 1.166e+00 3.837e+00 0.304 0.762075
## namechrysler cordoba 2.795e+00 3.176e+00 0.880 0.381346
## namechrysler lebaron medallion -3.834e+00 3.701e+00 -1.036 0.303299
## namechrysler lebaron salon -8.001e+00 3.401e+00 -2.352 0.021081
## namechrysler lebaron town @ country (sw) 3.327e+00 3.090e+00 1.077 0.284873
## namechrysler new yorker brougham 1.309e+00 3.431e+00 0.381 0.703847
## namechrysler newport royal 2.289e+00 3.081e+00 0.743 0.459653
## namedatsun 1200 8.034e+00 6.273e+00 1.281 0.203952
## namedatsun 200-sx -2.891e+00 5.984e+00 -0.483 0.630254
## namedatsun 200sx 3.340e+00 6.048e+00 0.552 0.582264
## namedatsun 210 4.302e+00 6.004e+00 0.717 0.475717
## namedatsun 210 mpg 3.405e+00 6.224e+00 0.547 0.585846
## namedatsun 280-zx 8.036e+00 6.398e+00 1.256 0.212731
## namedatsun 310 3.905e+00 5.819e+00 0.671 0.504092
## namedatsun 310 gx 2.837e+00 5.879e+00 0.483 0.630678
## namedatsun 510 4.148e-01 5.967e+00 0.070 0.944754
## namedatsun 510 (sw) 3.819e+00 6.307e+00 0.606 0.546536
## namedatsun 510 hatchback 7.843e+00 5.972e+00 1.313 0.192739
## namedatsun 610 -2.091e+00 6.223e+00 -0.336 0.737698
## namedatsun 710 1.740e+00 6.074e+00 0.287 0.775224
## namedatsun 810 -2.851e+00 6.244e+00 -0.457 0.649198
## namedatsun 810 maxima -2.468e+00 6.390e+00 -0.386 0.700306
## namedatsun b-210 3.029e+00 5.943e+00 0.510 0.611584
## namedatsun b210 3.345e+00 6.286e+00 0.532 0.596039
## namedatsun b210 gx 9.426e+00 6.139e+00 1.535 0.128588
## namedatsun f-10 hatchback 3.791e+00 5.900e+00 0.643 0.522324
## namedatsun pl510 1.330e+00 5.840e+00 0.228 0.820399
## namedodge aries se -1.723e+00 3.706e+00 -0.465 0.643288
## namedodge aries wagon (sw) -2.945e+00 3.663e+00 -0.804 0.423799
## namedodge aspen -1.757e+00 2.890e+00 -0.608 0.544932
## namedodge aspen 6 -1.682e-01 3.162e+00 -0.053 0.957701
## namedodge aspen se 1.236e+00 3.221e+00 0.384 0.702263
## namedodge challenger se 2.267e+00 3.157e+00 0.718 0.474893
## namedodge charger 2.2 3.703e+00 3.841e+00 0.964 0.337861
## namedodge colt 6.036e-01 3.308e+00 0.182 0.855690
## namedodge colt (sw) 2.022e+00 3.801e+00 0.532 0.596093
## namedodge colt hardtop 6.431e-01 3.715e+00 0.173 0.862996
## namedodge colt hatchback custom 5.739e+00 3.853e+00 1.489 0.140246
## namedodge colt m/m 6.420e+00 3.801e+00 1.689 0.095045
## namedodge coronet brougham 1.110e+00 3.025e+00 0.367 0.714555
## namedodge coronet custom 1.167e+00 2.959e+00 0.394 0.694288
## namedodge coronet custom (sw) 1.724e-01 3.144e+00 0.055 0.956405
## namedodge d100 -8.688e-01 2.971e+00 -0.292 0.770690
## namedodge d200 -2.798e+00 4.022e+00 -0.696 0.488536
## namedodge dart custom 2.655e-01 3.013e+00 0.088 0.929996
## namedodge diplomat 3.269e+00 3.006e+00 1.088 0.280008
## namedodge magnum xe 1.451e+00 3.025e+00 0.480 0.632731
## namedodge monaco (sw) -3.870e-01 3.405e+00 -0.114 0.909784
## namedodge monaco brougham 3.461e-01 3.016e+00 0.115 0.908913
## namedodge omni 1.237e+00 3.831e+00 0.323 0.747562
## namedodge rampage -1.029e+00 3.960e+00 -0.260 0.795636
## namedodge st. regis 2.203e+00 3.051e+00 0.722 0.472282
## namefiat 124 sport coupe 1.012e+00 3.122e+00 0.324 0.746596
## namefiat 124 tc -1.395e+00 3.015e+00 -0.462 0.644975
## namefiat 124b 2.686e+00 3.096e+00 0.868 0.388230
## namefiat 128 -1.853e+00 2.720e+00 -0.681 0.497641
## namefiat 131 1.415e+00 3.055e+00 0.463 0.644429
## namefiat strada custom 5.380e+00 2.937e+00 1.832 0.070701
## namefiat x1.9 2.391e+00 3.015e+00 0.793 0.430113
## nameford country 9.418e-01 3.215e+00 0.293 0.770338
## nameford country squire (sw) 1.140e+00 2.691e+00 0.423 0.673066
## nameford escort 2h -8.179e-01 3.976e+00 -0.206 0.837560
## nameford escort 4w 1.285e+00 3.872e+00 0.332 0.740878
## nameford f108 -1.802e+00 3.019e+00 -0.597 0.552208
## nameford f250 -2.153e+00 4.016e+00 -0.536 0.593398
## nameford fairmont -2.958e-01 3.640e+00 -0.081 0.935425
## nameford fairmont (auto) -3.528e+00 3.307e+00 -1.067 0.289208
## nameford fairmont (man) -1.285e+00 3.657e+00 -0.351 0.726156
## nameford fairmont 4 -3.873e+00 3.634e+00 -1.066 0.289759
## nameford fairmont futura -5.598e+00 3.706e+00 -1.510 0.134837
## nameford fiesta 5.120e+00 3.924e+00 1.305 0.195641
## nameford futura 3.527e-01 3.105e+00 0.114 0.909820
## nameford galaxie 500 2.436e+00 2.519e+00 0.967 0.336336
## nameford gran torino 9.785e-01 2.499e+00 0.392 0.696399
## nameford gran torino (sw) 8.303e-01 2.902e+00 0.286 0.775486
## nameford granada 7.805e-01 3.198e+00 0.244 0.807808
## nameford granada ghia 3.945e-02 3.408e+00 0.012 0.990792
## nameford granada gl -5.440e+00 3.350e+00 -1.624 0.108288
## nameford granada l -2.358e+00 3.344e+00 -0.705 0.482623
## nameford ltd 7.596e-01 2.708e+00 0.280 0.779819
## nameford ltd landau -4.401e-01 3.054e+00 -0.144 0.885761
## nameford maverick -1.562e-01 2.780e+00 -0.056 0.955343
## nameford mustang -1.435e+00 3.388e+00 -0.424 0.673032
## nameford mustang gl -3.696e+00 3.727e+00 -0.992 0.324373
## nameford mustang ii -3.019e+00 3.121e+00 -0.967 0.336225
## nameford mustang ii 2+2 2.458e-01 3.646e+00 0.067 0.946427
## nameford pinto -1.718e+00 3.050e+00 -0.563 0.574910
## nameford pinto (sw) -1.639e+00 3.657e+00 -0.448 0.655214
## nameford pinto runabout -2.115e+00 3.640e+00 -0.581 0.562910
## nameford ranger -2.563e+00 3.800e+00 -0.674 0.501996
## nameford thunderbird 2.701e+00 3.031e+00 0.891 0.375574
## nameford torino 8.141e-01 3.266e+00 0.249 0.803804
## nameford torino 500 2.791e-01 3.365e+00 0.083 0.934101
## namehi 1200d -6.662e-01 4.478e+00 -0.149 0.882109
## namehonda accord 1.482e+00 5.723e+00 0.259 0.796336
## namehonda accord cvcc 2.573e+00 6.152e+00 0.418 0.676935
## namehonda accord lx -9.910e-01 5.891e+00 -0.168 0.866827
## namehonda civic 1.362e+00 5.620e+00 0.242 0.809148
## namehonda civic (auto) -3.266e+00 5.840e+00 -0.559 0.577603
## namehonda civic 1300 -1.680e-01 5.764e+00 -0.029 0.976819
## namehonda civic 1500 gl 1.115e+01 5.695e+00 1.958 0.053723
## namehonda civic cvcc 3.430e+00 5.673e+00 0.605 0.547181
## namehonda prelude 8.712e-01 5.825e+00 0.150 0.881478
## namemaxda glc deluxe 1.454e+00 5.721e+00 0.254 0.800013
## namemaxda rx3 -9.547e+00 5.763e+00 -1.657 0.101448
## namemazda 626 3.180e-01 6.019e+00 0.053 0.957996
## namemazda glc 1.369e+01 5.992e+00 2.284 0.024991
## namemazda glc 4 2.503e-01 5.831e+00 0.043 0.965860
## namemazda glc custom -3.650e+00 6.030e+00 -0.605 0.546620
## namemazda glc custom l 2.459e+00 6.111e+00 0.402 0.688398
## namemazda glc deluxe 8.403e-02 6.097e+00 0.014 0.989037
## namemazda rx-4 -5.844e+00 5.920e+00 -0.987 0.326535
## namemazda rx-7 gs -7.546e+00 5.760e+00 -1.310 0.193899
## namemazda rx2 coupe -7.738e+00 5.866e+00 -1.319 0.190850
## namemercedes-benz 240d 1.412e+00 3.640e+00 0.388 0.699025
## namemercedes-benz 280s -4.105e+00 3.741e+00 -1.097 0.275833
## namemercedes benz 300d 8.104e-02 3.573e+00 0.023 0.981960
## namemercury capri 2000 -2.041e+00 3.765e+00 -0.542 0.589373
## namemercury capri v6 -3.939e-01 3.406e+00 -0.116 0.908219
## namemercury cougar brougham -8.233e-01 3.127e+00 -0.263 0.792997
## namemercury grand marquis 5.973e-01 3.099e+00 0.193 0.847627
## namemercury lynx l 2.965e+00 3.866e+00 0.767 0.445307
## namemercury marquis -5.638e-01 3.263e+00 -0.173 0.863233
## namemercury marquis brougham 8.857e-01 3.334e+00 0.266 0.791167
## namemercury monarch -2.990e+00 3.477e+00 -0.860 0.392416
## namemercury monarch ghia 3.246e+00 3.014e+00 1.077 0.284724
## namemercury zephyr -2.409e+00 3.305e+00 -0.729 0.468020
## namemercury zephyr 6 -3.303e+00 3.328e+00 -0.993 0.323874
## namenissan stanza xe 4.231e+00 5.985e+00 0.707 0.481637
## nameoldsmobile cutlass ciera (diesel) 1.318e+01 3.513e+00 3.752 0.000329
## nameoldsmobile cutlass ls 9.886e+00 3.639e+00 2.716 0.008065
## nameoldsmobile cutlass salon brougham 3.903e+00 3.008e+00 1.297 0.198209
## nameoldsmobile cutlass supreme 1.013e+00 3.316e+00 0.305 0.760828
## nameoldsmobile delta 88 royale 1.607e-01 3.098e+00 0.052 0.958750
## nameoldsmobile omega -2.309e+00 2.933e+00 -0.787 0.433586
## nameoldsmobile omega brougham 2.529e+00 3.361e+00 0.753 0.453903
## nameoldsmobile starfire sx -1.159e+00 3.588e+00 -0.323 0.747517
## nameoldsmobile vista cruiser -7.350e-01 3.178e+00 -0.231 0.817701
## nameopel 1900 1.130e+00 2.719e+00 0.416 0.678842
## nameopel manta -1.785e+00 2.648e+00 -0.674 0.502190
## namepeugeot 304 4.862e+00 3.517e+00 1.382 0.170640
## namepeugeot 504 -6.081e-01 3.088e+00 -0.197 0.844377
## namepeugeot 504 (sw) -1.156e+00 3.733e+00 -0.310 0.757539
## namepeugeot 505s turbo diesel -7.149e-01 3.586e+00 -0.199 0.842497
## namepeugeot 604sl -5.399e+00 3.588e+00 -1.505 0.136300
## nameplymouth 'cuda 340 -2.290e+00 3.370e+00 -0.680 0.498741
## nameplymouth arrow gs 4.434e-01 3.719e+00 0.119 0.905401
## nameplymouth champ 5.500e+00 3.923e+00 1.402 0.164737
## nameplymouth cricket 3.214e+00 3.811e+00 0.843 0.401528
## nameplymouth custom suburb 9.588e-01 3.160e+00 0.303 0.762333
## nameplymouth duster 1.987e+00 2.760e+00 0.720 0.473479
## nameplymouth fury 1.725e-01 3.297e+00 0.052 0.958411
## nameplymouth fury gran sedan 1.401e+00 3.063e+00 0.457 0.648583
## nameplymouth fury iii 9.816e-01 2.546e+00 0.386 0.700877
## nameplymouth grand fury 2.480e+00 3.152e+00 0.787 0.433633
## nameplymouth horizon 2.184e+00 3.915e+00 0.558 0.578534
## nameplymouth horizon 4 6.734e-01 3.914e+00 0.172 0.863833
## nameplymouth horizon miser 2.744e+00 3.939e+00 0.696 0.488139
## nameplymouth horizon tc3 3.599e+00 3.837e+00 0.938 0.350983
## nameplymouth reliant -1.848e+00 3.426e+00 -0.539 0.591176
## nameplymouth sapporo 9.800e-02 3.586e+00 0.027 0.978264
## nameplymouth satellite 3.277e+00 3.150e+00 1.040 0.301284
## nameplymouth satellite custom -1.991e+00 3.256e+00 -0.611 0.542641
## nameplymouth satellite custom (sw) 1.318e+00 3.033e+00 0.435 0.664981
## nameplymouth satellite sebring -4.907e-02 3.209e+00 -0.015 0.987836
## nameplymouth valiant 1.286e+00 2.799e+00 0.459 0.647125
## nameplymouth valiant custom -6.527e-01 3.206e+00 -0.204 0.839172
## nameplymouth volare 2.987e-02 3.191e+00 0.009 0.992555
## nameplymouth volare custom -4.628e-01 3.219e+00 -0.144 0.886043
## nameplymouth volare premier v8 -1.568e+00 2.968e+00 -0.528 0.598676
## namepontiac astro -3.022e-01 3.585e+00 -0.084 0.933027
## namepontiac catalina 2.841e+00 2.594e+00 1.095 0.276577
## namepontiac catalina brougham 2.663e+00 3.089e+00 0.862 0.391195
## namepontiac firebird 1.199e+00 3.262e+00 0.368 0.714073
## namepontiac grand prix 8.715e-01 3.729e+00 0.234 0.815786
## namepontiac grand prix lj 2.777e+00 3.096e+00 0.897 0.372505
## namepontiac j2000 se hatchback -4.294e-01 3.792e+00 -0.113 0.910121
## namepontiac lemans v6 7.112e-01 3.148e+00 0.226 0.821845
## namepontiac phoenix 2.115e+00 3.326e+00 0.636 0.526636
## namepontiac phoenix lj 7.019e-01 3.194e+00 0.220 0.826638
## namepontiac safari (sw) 1.593e+00 3.465e+00 0.460 0.647023
## namepontiac sunbird coupe -2.910e-01 3.607e+00 -0.081 0.935907
## namepontiac ventura sj 6.111e-01 3.141e+00 0.195 0.846204
## namerenault 12 (sw) 1.094e-01 3.229e+00 0.034 0.973070
## namerenault 12tl -1.132e-01 3.010e+00 -0.038 0.970106
## namerenault 5 gtl 5.810e+00 3.052e+00 1.904 0.060481
## namesaab 99e 2.497e+00 3.562e+00 0.701 0.485253
## namesaab 99gle -3.064e+00 3.321e+00 -0.923 0.358840
## namesaab 99le 9.417e-01 2.856e+00 0.330 0.742457
## namesubaru -2.162e-02 5.834e+00 -0.004 0.997052
## namesubaru dl 8.861e-01 5.762e+00 0.154 0.878167
## nametoyota carina -3.966e+00 6.613e+00 -0.600 0.550369
## nametoyota celica gt 1.568e+00 6.099e+00 0.257 0.797775
## nametoyota celica gt liftback -5.429e+00 6.008e+00 -0.904 0.368906
## nametoyota corolla 3.723e-01 5.634e+00 0.066 0.947481
## nametoyota corolla 1200 4.255e+00 6.388e+00 0.666 0.507244
## nametoyota corolla 1600 (sw) 2.245e+00 6.180e+00 0.363 0.717395
## nametoyota corolla liftback -2.247e+00 6.173e+00 -0.364 0.716794
## nametoyota corolla tercel 4.905e+00 6.077e+00 0.807 0.421924
## nametoyota corona -1.173e-02 5.630e+00 -0.002 0.998343
## nametoyota corona hardtop 4.899e-02 6.139e+00 0.008 0.993652
## nametoyota corona liftback 9.556e-01 6.064e+00 0.158 0.875179
## nametoyota corona mark ii -1.514e-01 6.200e+00 -0.024 0.980579
## nametoyota cressida -1.015e+00 6.375e+00 -0.159 0.873858
## nametoyota mark ii -2.116e+00 6.213e+00 -0.341 0.734250
## nametoyota starlet 3.658e+00 5.815e+00 0.629 0.531081
## nametoyota tercel 3.178e+00 5.918e+00 0.537 0.592703
## nametoyouta corona mark ii (sw) -1.118e+00 6.077e+00 -0.184 0.854542
## nametriumph tr7 coupe 5.755e+00 3.035e+00 1.896 0.061521
## namevokswagen rabbit -3.731e+00 2.939e+00 -1.270 0.207892
## namevolkswagen 1131 deluxe sedan -1.284e+00 3.781e+00 -0.340 0.735090
## namevolkswagen 411 (sw) -2.158e+00 3.325e+00 -0.649 0.518224
## namevolkswagen dasher -1.501e+00 2.463e+00 -0.609 0.543901
## namevolkswagen jetta 4.783e-02 2.979e+00 0.016 0.987228
## namevolkswagen model 111 9.036e-01 3.375e+00 0.268 0.789574
## namevolkswagen rabbit -5.600e-01 2.593e+00 -0.216 0.829537
## namevolkswagen rabbit custom 1.963e-01 2.960e+00 0.066 0.947280
## namevolkswagen rabbit custom diesel 1.218e+01 3.229e+00 3.771 0.000308
## namevolkswagen rabbit l 2.479e+00 3.040e+00 0.815 0.417207
## namevolkswagen scirocco 8.440e-01 2.932e+00 0.288 0.774223
## namevolkswagen super beetle -1.789e+00 3.429e+00 -0.522 0.603207
## namevolkswagen type 3 -1.650e+00 4.036e+00 -0.409 0.683832
## namevolvo 144ea -3.305e+00 3.380e+00 -0.978 0.331048
## namevolvo 145e (sw) -4.771e+00 3.379e+00 -1.412 0.161755
## namevolvo 244dl -3.048e+00 3.210e+00 -0.950 0.345136
## namevolvo 245 -4.299e+00 3.323e+00 -1.294 0.199497
## namevolvo 264gl -6.005e+00 3.314e+00 -1.812 0.073673
## namevolvo diesel 1.787e+00 3.586e+00 0.498 0.619523
## namevw dasher (diesel) 1.107e+01 3.555e+00 3.115 0.002544
## namevw pickup 1.046e+01 3.887e+00 2.691 0.008661
## namevw rabbit 4.576e+00 2.538e+00 1.803 0.075127
## namevw rabbit c (diesel) 1.124e+01 3.302e+00 3.405 0.001030
## namevw rabbit custom NA NA NA NA
## I(horsepower^2) 6.921e-04 2.467e-04 2.805 0.006297
## I(year^2) 5.370e-02 2.314e-02 2.320 0.022836
## acceleration:year 2.449e-02 3.946e-02 0.621 0.536606
## acceleration:origin 1.565e-01 2.310e-01 0.678 0.499882
##
## (Intercept) **
## cylinders
## displacement *
## horsepower **
## weight
## acceleration
## year *
## origin
## nameamc ambassador dpl
## nameamc ambassador sst
## nameamc concord
## nameamc concord d/l
## nameamc concord dl 6
## nameamc gremlin
## nameamc hornet
## nameamc hornet sportabout (sw)
## nameamc matador
## nameamc matador (sw)
## nameamc pacer
## nameamc pacer d/l
## nameamc rebel sst
## nameamc spirit dl
## nameaudi 100 ls
## nameaudi 100ls
## nameaudi 4000
## nameaudi 5000
## nameaudi 5000s (diesel) .
## nameaudi fox
## namebmw 2002
## namebmw 320i
## namebuick century
## namebuick century 350
## namebuick century limited
## namebuick century luxus (sw)
## namebuick century special
## namebuick electra 225 custom
## namebuick estate wagon (sw)
## namebuick lesabre custom
## namebuick opel isuzu deluxe
## namebuick regal sport coupe (turbo)
## namebuick skyhawk
## namebuick skylark
## namebuick skylark 320
## namebuick skylark limited
## namecadillac eldorado **
## namecadillac seville
## namecapri ii
## namechevroelt chevelle malibu
## namechevrolet bel air
## namechevrolet camaro
## namechevrolet caprice classic
## namechevrolet cavalier
## namechevrolet cavalier 2-door
## namechevrolet cavalier wagon
## namechevrolet chevelle concours (sw)
## namechevrolet chevelle malibu
## namechevrolet chevelle malibu classic
## namechevrolet chevette
## namechevrolet citation
## namechevrolet concours
## namechevrolet impala
## namechevrolet malibu
## namechevrolet malibu classic (sw)
## namechevrolet monte carlo
## namechevrolet monte carlo landau
## namechevrolet monte carlo s
## namechevrolet monza 2+2
## namechevrolet nova
## namechevrolet nova custom
## namechevrolet vega
## namechevrolet vega (sw)
## namechevrolet vega 2300
## namechevrolet woody
## namechevy c10
## namechevy c20
## namechevy s-10
## namechrysler cordoba
## namechrysler lebaron medallion
## namechrysler lebaron salon *
## namechrysler lebaron town @ country (sw)
## namechrysler new yorker brougham
## namechrysler newport royal
## namedatsun 1200
## namedatsun 200-sx
## namedatsun 200sx
## namedatsun 210
## namedatsun 210 mpg
## namedatsun 280-zx
## namedatsun 310
## namedatsun 310 gx
## namedatsun 510
## namedatsun 510 (sw)
## namedatsun 510 hatchback
## namedatsun 610
## namedatsun 710
## namedatsun 810
## namedatsun 810 maxima
## namedatsun b-210
## namedatsun b210
## namedatsun b210 gx
## namedatsun f-10 hatchback
## namedatsun pl510
## namedodge aries se
## namedodge aries wagon (sw)
## namedodge aspen
## namedodge aspen 6
## namedodge aspen se
## namedodge challenger se
## namedodge charger 2.2
## namedodge colt
## namedodge colt (sw)
## namedodge colt hardtop
## namedodge colt hatchback custom
## namedodge colt m/m .
## namedodge coronet brougham
## namedodge coronet custom
## namedodge coronet custom (sw)
## namedodge d100
## namedodge d200
## namedodge dart custom
## namedodge diplomat
## namedodge magnum xe
## namedodge monaco (sw)
## namedodge monaco brougham
## namedodge omni
## namedodge rampage
## namedodge st. regis
## namefiat 124 sport coupe
## namefiat 124 tc
## namefiat 124b
## namefiat 128
## namefiat 131
## namefiat strada custom .
## namefiat x1.9
## nameford country
## nameford country squire (sw)
## nameford escort 2h
## nameford escort 4w
## nameford f108
## nameford f250
## nameford fairmont
## nameford fairmont (auto)
## nameford fairmont (man)
## nameford fairmont 4
## nameford fairmont futura
## nameford fiesta
## nameford futura
## nameford galaxie 500
## nameford gran torino
## nameford gran torino (sw)
## nameford granada
## nameford granada ghia
## nameford granada gl
## nameford granada l
## nameford ltd
## nameford ltd landau
## nameford maverick
## nameford mustang
## nameford mustang gl
## nameford mustang ii
## nameford mustang ii 2+2
## nameford pinto
## nameford pinto (sw)
## nameford pinto runabout
## nameford ranger
## nameford thunderbird
## nameford torino
## nameford torino 500
## namehi 1200d
## namehonda accord
## namehonda accord cvcc
## namehonda accord lx
## namehonda civic
## namehonda civic (auto)
## namehonda civic 1300
## namehonda civic 1500 gl .
## namehonda civic cvcc
## namehonda prelude
## namemaxda glc deluxe
## namemaxda rx3
## namemazda 626
## namemazda glc *
## namemazda glc 4
## namemazda glc custom
## namemazda glc custom l
## namemazda glc deluxe
## namemazda rx-4
## namemazda rx-7 gs
## namemazda rx2 coupe
## namemercedes-benz 240d
## namemercedes-benz 280s
## namemercedes benz 300d
## namemercury capri 2000
## namemercury capri v6
## namemercury cougar brougham
## namemercury grand marquis
## namemercury lynx l
## namemercury marquis
## namemercury marquis brougham
## namemercury monarch
## namemercury monarch ghia
## namemercury zephyr
## namemercury zephyr 6
## namenissan stanza xe
## nameoldsmobile cutlass ciera (diesel) ***
## nameoldsmobile cutlass ls **
## nameoldsmobile cutlass salon brougham
## nameoldsmobile cutlass supreme
## nameoldsmobile delta 88 royale
## nameoldsmobile omega
## nameoldsmobile omega brougham
## nameoldsmobile starfire sx
## nameoldsmobile vista cruiser
## nameopel 1900
## nameopel manta
## namepeugeot 304
## namepeugeot 504
## namepeugeot 504 (sw)
## namepeugeot 505s turbo diesel
## namepeugeot 604sl
## nameplymouth 'cuda 340
## nameplymouth arrow gs
## nameplymouth champ
## nameplymouth cricket
## nameplymouth custom suburb
## nameplymouth duster
## nameplymouth fury
## nameplymouth fury gran sedan
## nameplymouth fury iii
## nameplymouth grand fury
## nameplymouth horizon
## nameplymouth horizon 4
## nameplymouth horizon miser
## nameplymouth horizon tc3
## nameplymouth reliant
## nameplymouth sapporo
## nameplymouth satellite
## nameplymouth satellite custom
## nameplymouth satellite custom (sw)
## nameplymouth satellite sebring
## nameplymouth valiant
## nameplymouth valiant custom
## nameplymouth volare
## nameplymouth volare custom
## nameplymouth volare premier v8
## namepontiac astro
## namepontiac catalina
## namepontiac catalina brougham
## namepontiac firebird
## namepontiac grand prix
## namepontiac grand prix lj
## namepontiac j2000 se hatchback
## namepontiac lemans v6
## namepontiac phoenix
## namepontiac phoenix lj
## namepontiac safari (sw)
## namepontiac sunbird coupe
## namepontiac ventura sj
## namerenault 12 (sw)
## namerenault 12tl
## namerenault 5 gtl .
## namesaab 99e
## namesaab 99gle
## namesaab 99le
## namesubaru
## namesubaru dl
## nametoyota carina
## nametoyota celica gt
## nametoyota celica gt liftback
## nametoyota corolla
## nametoyota corolla 1200
## nametoyota corolla 1600 (sw)
## nametoyota corolla liftback
## nametoyota corolla tercel
## nametoyota corona
## nametoyota corona hardtop
## nametoyota corona liftback
## nametoyota corona mark ii
## nametoyota cressida
## nametoyota mark ii
## nametoyota starlet
## nametoyota tercel
## nametoyouta corona mark ii (sw)
## nametriumph tr7 coupe .
## namevokswagen rabbit
## namevolkswagen 1131 deluxe sedan
## namevolkswagen 411 (sw)
## namevolkswagen dasher
## namevolkswagen jetta
## namevolkswagen model 111
## namevolkswagen rabbit
## namevolkswagen rabbit custom
## namevolkswagen rabbit custom diesel ***
## namevolkswagen rabbit l
## namevolkswagen scirocco
## namevolkswagen super beetle
## namevolkswagen type 3
## namevolvo 144ea
## namevolvo 145e (sw)
## namevolvo 244dl
## namevolvo 245
## namevolvo 264gl .
## namevolvo diesel
## namevw dasher (diesel) **
## namevw pickup **
## namevw rabbit .
## namevw rabbit c (diesel) **
## namevw rabbit custom
## I(horsepower^2) **
## I(year^2) *
## acceleration:year
## acceleration:origin
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.064 on 81 degrees of freedom
## Multiple R-squared: 0.9855, Adjusted R-squared: 0.9301
## F-statistic: 17.77 on 310 and 81 DF, p-value: < 2.2e-16
plot(lm3_)
## Warning: not plotting observations with leverage one:
## 2, 3, 4, 5, 10, 11, 12, 13, 15, 20, 22, 23, 24, 26, 27, 28, 29, 31, 34, 36, 39, 42, 44, 45, 46, 47, 48, 49, 51, 52, 54, 55, 56, 57, 58, 59, 61, 66, 67, 68, 69, 70, 71, 73, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 89, 90, 93, 94, 95, 96, 98, 102, 104, 105, 106, 110, 111, 113, 114, 115, 116, 120, 121, 124, 128, 134, 136, 137, 140, 147, 150, 151, 153, 156, 157, 160, 163, 164, 165, 169, 175, 178, 181, 183, 185, 187, 195, 198, 199, 200, 201, 203, 206, 207, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 221, 222, 223, 224, 226, 227, 228, 230, 231, 232, 233, 234, 235, 237, 240, 241, 242, 243, 244, 245, 246, 249, 250, 251, 253, 254, 255, 257, 258, 260, 262, 263, 264, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 285, 286, 287, 290, 291, 292, 293, 294, 295, 296, 297, 300, 301, 303, 304, 306, 309, 311, 313, 316, 317, 319, 321, 324, 325, 326, 327, 328, 330, 331, 332, 333, 337, 340, 341, 342, 344, 345, 346, 347, 348, 349, 350, 351, 353, 356, 357, 358, 360, 361, 362, 363, 364, 365, 366, 367, 369, 370, 371, 372, 373, 374, 375, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
The adjusted R2 has improved from 0.8205 to 0.9059, which is pretty impressive given that no additional data has been collected.
a Fit a multiple regression model to predict Sales using Price, Urban, and US.
sales<-lm(Sales~Price+Urban+US,data=Carseats)
summary(sales)
##
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9206 -1.6220 -0.0564 1.5786 7.0581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
## Price -0.054459 0.005242 -10.389 < 2e-16 ***
## UrbanYes -0.021916 0.271650 -0.081 0.936
## USYes 1.200573 0.259042 4.635 4.86e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
b Provide an interpretation of each coefficient in the model. Be careful-some of the variables in the model are qualitative!
Price = -0.054
Interpretation = the effect of a 1-unit increase in Price (for fixed values of Urban & US) is a change in Sales of -0.054 units (54 sales).
Urban = -0.022
Interpretation = the effect of a store being in an urban area (for fixed values of Price & US) is a change in Sales of 0.022 units (22 sales). However, in this case, since the p-value for this variables T-test is so high, we can say that there is no evidence for a relationship between the car seat Sales at a store and whether the store was Urban (or rural).
US = 1.200
Interpretation = the effect of a store being in the US (for fixed values of Price & Urban) is a change in Sales of 1.2 units (1200 sales).
c Write out the model in equation form, being careful to handle the qualitative variables properly.
Sales =13.043469−0.054459⋅Price−0.021916⋅Urban+1.200573⋅US Where:
Urban = 1 for a store in an urban location, else 0
US = 1 for a store in the US, else
d For which of the predictors can you reject the null hypothesis H0:βj=0
Null hypothesis can be rejected for Price and USYes as p-value is less than 0.05
e On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome.
sales1<-lm(Sales~Price+US,data=Carseats)
summary(sales1)
##
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9269 -1.6286 -0.0574 1.5766 7.0515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.03079 0.63098 20.652 < 2e-16 ***
## Price -0.05448 0.00523 -10.416 < 2e-16 ***
## USYes 1.19964 0.25846 4.641 4.71e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2354
## F-statistic: 62.43 on 2 and 397 DF, p-value: < 2.2e-16
fHow well do the models in (a) and (e) fit the data?
anova(sales,sales1)
Res.Df <dbl> | RSS <dbl> | Df <dbl> | Sum of Sq <dbl> | F <dbl> | Pr(>F) <dbl> | |
|---|---|---|---|---|---|---|
| 1 | 396 | 2420.835 | NA | NA | NA | NA |
| 2 | 397 | 2420.874 | -1 | -0.03979039 | 0.00650891 | 0.9357389 |
For model (a), we have:
R2 = 0.23928 adjusted R2 = 0.23351 For model (e), we have:
R2 = 0.23926 adjusted R2 = 0.23543 In this case, both models explain ~23.9% of the variance in Sales.
On removing urban variable there is slight increase in adjusted r square and decrease in residual standard error. But when we do anova test to find the the difference is statistically significant we fail to reject the null hypothesis and have to conclude that both models are not significantly different.
g Using the model from (e), obtain 95% confidence intervals for the coefficient(s).
confint(sales1)
## 2.5 % 97.5 %
## (Intercept) 11.79032020 14.27126531
## Price -0.06475984 -0.04419543
## USYes 0.69151957 1.70776632
h Is there evidence of outliers or high leverage observations in the model from (e)
plot(predict(sales1),rstudent(sales1) ,col="blue")
abline(2, 0, col = "red")
abline(-2, 0 , col = "red")
lev<-hat(model.matrix(sales1))
plot(lev)
We use the plot of rstudentized residuals (residual/ standard error) vs predicted value plot to detect presence of outliers. As rstudentized residuals are within the limit of -3 to 3 we can say that there is no outlier present in the data. outlier is defined as the observation affecting the value of Y to be significantly different from the expected trend
lev<-hat(model.matrix(sales1)) plot(lev)
4/nrow(Carseats)
plot(Carseats$Sales,Carseats$Price)
points(Carseats[lev>0.01,]$Sales,Carseats[lev>0.01,]$Price,col='red')
Here (p+1)/n = (3+1)/400= 0.01 and we find points for which the leverage is greater than 0.01 and color them to show they are high leverage points
In this problem we will investigate the t-statistic for the null hypothesis H0:β=0 in simple linear regression without an intercept. To begin, we generate a predictor x and a response y as follows.
set.seed(1)
x = rnorm (100)
y = 2*x + rnorm(100)
a Perform a simple linear regression of y onto x, without an intercept. Report the coefficient estimate βa^, the standard error of this coefficient estimate, and the t-statistic and p-value associated with the null hypothesis H0:βa=0. Comment on these results
set.seed(1)
x=rnorm(100)
y=2*x+rnorm(100)
slr<-lm(y~x+0)
summary(slr)
##
## Call:
## lm(formula = y ~ x + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9154 -0.6472 -0.1771 0.5056 2.3109
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x 1.9939 0.1065 18.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9586 on 99 degrees of freedom
## Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776
## F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16
βa^ = 1.9938761 SE(βa^) = 0.1064767 T = βa−0SE(βa) = 18.7259319 p = 2.642196910^{-34} The model p-value strongly suggests that there is a relationship between y & x.
lm describes a relationship of the form:
y=βa⋅x=1.9939⋅x
b Perform a simple linear regression of y onto x, without an intercept. Report the coefficient estimate βb^, the standard error of this coefficient estimate, and the t-statistic and p-value associated with the null hypothesis H0:βb=0. Comment on these results
revslr<-lm(x~y+0)
summary(revslr)
##
## Call:
## lm(formula = x ~ y + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8699 -0.2368 0.1030 0.2858 0.8938
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## y 0.39111 0.02089 18.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4246 on 99 degrees of freedom
## Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776
## F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16
lm_2 <-lm(x ~ y + 0)
summary(lm_2)
##
## Call:
## lm(formula = x ~ y + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8699 -0.2368 0.1030 0.2858 0.8938
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## y 0.39111 0.02089 18.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4246 on 99 degrees of freedom
## Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776
## F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16
coefficient estimate is 0.3911145 and standard error is 0.02089.The t statistic is obtained by dividing parameter estimate with its standard error which is given by 18.73 and p-value associated with is less than 0.05 and hence we reject the null hypothesis that coefficient of x is zero.
Also
We have:
βb = 0.3911145 SE(βb) = 0.0208863 T = βb−0SE(βb) = 18.7259319 p = 2.642196910{-34}
The model p-value strongly suggests that there is a relationship between x & y.
lm_2 describes a relationship of the form:
x=βb⋅y=0.3911⋅x
c What is the relationship between the results obtained in (a) and (b)?
We get the same t statistic and p-value for both the cases and the intercept is changed and it is not the inverse so we cannot say that y=mx+c is written as x=(1/m)(y-c)
d
n=length(x)
t=sqrt(n - 1)*(x %*% y)/sqrt(sum(x^2) * sum(y^2) - (x %*% y)^2)
as.numeric(t)
## [1] 18.72593
lm_1 <- lm(y ~ x + 0)
summary(lm_1)
##
## Call:
## lm(formula = y ~ x + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9154 -0.6472 -0.1771 0.5056 2.3109
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x 1.9939 0.1065 18.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9586 on 99 degrees of freedom
## Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776
## F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16
From the model output we obtain the following T-statistic: 18.7259319
We get t-statistic as 18.7259319 from the formula which is equal to the t-statistic that we obtained earlier by dividing parameter estimate beta by standard error of beta.
Hence :TRUE
e Using the results from (d), argue that the t-statistic for the regression of y onto x is the same as the t-statistic for the regression of x onto y.
As the formula indicates it only depends on the value of x and y we get same t statistic for both the cases
f In R, show that when regression is performed with an intercept, the t-statistic for H0:β1=0 is the same for the regression of y onto x as it is for the regression of x onto y.
revslr1<-lm(x~y)
summary(revslr1)
##
## Call:
## lm(formula = x ~ y)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.90848 -0.28101 0.06274 0.24570 0.85736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.03880 0.04266 0.91 0.365
## y 0.38942 0.02099 18.56 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4249 on 98 degrees of freedom
## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762
## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
slr1<-lm(y~x)
summary(slr1)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8768 -0.6138 -0.1395 0.5394 2.3462
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03769 0.09699 -0.389 0.698
## x 1.99894 0.10773 18.556 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9628 on 98 degrees of freedom
## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762
## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
We get the t-statistic as 18.56 for both the cases
a Recall that the coefficient estimate βa^ for the linear regression of y onto x without an intercept is given by (3.38). Under what circumstance is the coefficient estimate for the regression of x onto y the same as the coefficient estimate for the regression of y onto x?
From the equation it is clear that the parameter estimate will be equal if summation xi 2 equals summation yi 2
b Generate an example in R with n = 100 observations in which the coefficient estimate for the regression of x onto y is different from the coefficient estimate for the regression of y onto x
x=rnorm(100)
y=rbinom(100,2,0.3)
eg<-lm(y~x+0)
summary(eg)
##
## Call:
## lm(formula = y ~ x + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.14889 0.01761 0.91274 1.03282 2.19132
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x 0.08372 0.09067 0.923 0.358
##
## Residual standard error: 0.9334 on 99 degrees of freedom
## Multiple R-squared: 0.008539, Adjusted R-squared: -0.001476
## F-statistic: 0.8526 on 1 and 99 DF, p-value: 0.3581
eg1<-lm(x~y+0)
summary(eg1)
##
## Call:
## lm(formula = x ~ y + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.88892 -0.56513 -0.02129 0.61989 2.54718
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## y 0.1020 0.1105 0.923 0.358
##
## Residual standard error: 1.03 on 99 degrees of freedom
## Multiple R-squared: 0.008539, Adjusted R-squared: -0.001476
## F-statistic: 0.8526 on 1 and 99 DF, p-value: 0.3581
Here we get different coefficients for both cases. For case 1(y~x) coefficient estimate is 0.0837186 while for case 2(x~y) it is 0.1019913
c Generate an example in R with n = 100 observations in which the coefficient estimate for the regression of x onto y is the same from the coefficient estimate for the regression of y onto x
x=1:100
y=100:1
eg3<-lm(y~x+0)
summary(eg3)
##
## Call:
## lm(formula = y ~ x + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -49.75 -12.44 24.87 62.18 99.49
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x 0.5075 0.0866 5.86 6.09e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 50.37 on 99 degrees of freedom
## Multiple R-squared: 0.2575, Adjusted R-squared: 0.25
## F-statistic: 34.34 on 1 and 99 DF, p-value: 6.094e-08
eg4<-lm(x~y+0)
summary(eg4)
##
## Call:
## lm(formula = x ~ y + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -49.75 -12.44 24.87 62.18 99.49
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## y 0.5075 0.0866 5.86 6.09e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 50.37 on 99 degrees of freedom
## Multiple R-squared: 0.2575, Adjusted R-squared: 0.25
## F-statistic: 34.34 on 1 and 99 DF, p-value: 6.094e-08
Here we get same coefficients for both cases. For case 1(y~x) coefficient estimate is 0.5074627 while for case 2(x~y) it is 0.5074627
a Using the rnorm() function, create a vector, x, containing 100 observations drawn from a N(0,1) distribution. This represents a feature, x
set.seed(1)
x=rnorm(100)
b Using the rnorm() function, create a vector, eps, containing 100 observations drawn from a N(0, 0.25) distribution i.e. a normal distribution with mean zero and variance 0.25.
eps<-rnorm(100,mean=0,sd=sqrt(0.25))
c Using x and eps, generate a vector y according to the model Y = ???1 + 0.5X + error. What is the length of the vector y? What are the values of ??0 and ??1 in this linear model?
y <- -1+0.5*x+eps
length(y)
## [1] 100
The length of vector y is 100. The coefficient estimate B0 and B1 are given by (-1) & 0.5 respectively.
d Create a scatterplot displaying the relationship between x and y. Comment on what you observe.
plot(x,y)
As expected (linear relation between x and y from equation y = -1 +0.5x +eps) the plot shows linear relationship with certain noise
e Fit a least squares linear model to predict y using x. Comment on the model obtained. How do β0^ and β1^ compare to β0 and β1?
sim<-lm(y~x)
summary(sim)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.93842 -0.30688 -0.06975 0.26970 1.17309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.01885 0.04849 -21.010 < 2e-16 ***
## x 0.49947 0.05386 9.273 4.58e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4814 on 98 degrees of freedom
## Multiple R-squared: 0.4674, Adjusted R-squared: 0.4619
## F-statistic: 85.99 on 1 and 98 DF, p-value: 4.583e-15
The coefficient estimates obtained from the simulated model sim are close to -1 and 0.5. The adjusted R-squared value being 0.4619164 explaining round(summary(sim)$adj.r.squared*100,2) percent of the variation
f Display the least squares line on the scatterplot obtained in (d). Draw the population regression line on the plot, in a different color. Use the legend() command to create an appropriate legend.
plot(x,y)
abline(sim,col='red')
abline(-1,0.5,col="green")
legend("topleft",c("Least square","Population"),col=c("red","green"),lty=c(1,1))
g Now fit a polynomial regression model that predicts y using x and x^2. Is there evidence that the quadratic term improves the model fit? Explain your answer.
polyn<-lm(y~x+I(x^2))
summary(polyn)
##
## Call:
## lm(formula = y ~ x + I(x^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.98252 -0.31270 -0.06441 0.29014 1.13500
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.97164 0.05883 -16.517 < 2e-16 ***
## x 0.50858 0.05399 9.420 2.4e-15 ***
## I(x^2) -0.05946 0.04238 -1.403 0.164
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.479 on 97 degrees of freedom
## Multiple R-squared: 0.4779, Adjusted R-squared: 0.4672
## F-statistic: 44.4 on 2 and 97 DF, p-value: 2.038e-14
anova(polyn,sim)
Res.Df <dbl> | RSS <dbl> | Df <dbl> | Sum of Sq <dbl> | F <dbl> | Pr(>F) <dbl> | |
|---|---|---|---|---|---|---|
| 1 | 97 | 22.25728 | NA | NA | NA | NA |
| 2 | 98 | 22.70890 | -1 | -0.4516256 | 1.968241 | 0.1638275 |
The addition of x2 term does not improve model. This is quantified by the anova test between two models which fails to reject the null hypothesis of two models being different. Also the p-value of x2 coefficient is greater than 0.5 indicating its statistical insignificance.
h Repeat (a)-(f) after modifying the data generation process in such a way that there is less noise in the data. The model (3.39) should remain the same. You can do this by decreasing the variance of the normal distribution used to generate the error term ϵ in (b). Describe your results.
set.seed(1)
x=rnorm(100)
eps<-rnorm(100,mean=0,sd=sqrt(0.1))
y=-1+0.5*x+eps
simlow<-lm(y~x)
summary(simlow)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.59351 -0.19409 -0.04411 0.17057 0.74193
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.01192 0.03067 -32.99 <2e-16 ***
## x 0.49966 0.03407 14.67 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3044 on 98 degrees of freedom
## Multiple R-squared: 0.687, Adjusted R-squared: 0.6838
## F-statistic: 215.1 on 1 and 98 DF, p-value: < 2.2e-16
plot(x,y)
abline(simlow,col='red')
abline(-1,0.5,col="blue")
legend("topleft",c("Least square line","True Population line - less Variance"),col=c("red","blue"),lty=c(1,1))
Here we reduced the noise by reducing the variance of error term and keeping the equation same as before. We see the R- square has increased and plot suggest relationship is more linear.
i Repeat (a)-(f) after modifying the data generation process in such a way that there is more noise in the data. The model (3.39) should remain the same. You can do this by increasing the variance of the normal distribution used to generate the error term ϵ in (b). Describe your results.
set.seed(1)
x=rnorm(100)
eps<-rnorm(100,mean=0,sd=sqrt(4))
y=-1+0.5*x+eps
simhigh<-lm(y~x)
summary(simhigh)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.754 -1.228 -0.279 1.079 4.692
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.0754 0.1940 -5.544 2.5e-07 ***
## x 0.4979 0.2155 2.311 0.0229 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.926 on 98 degrees of freedom
## Multiple R-squared: 0.05167, Adjusted R-squared: 0.042
## F-statistic: 5.34 on 1 and 98 DF, p-value: 0.02294
plot(x,y)
abline(simhigh,col='orange')
abline(-1,0.5,col="blue")
legend("topleft",c("Least square line","True Population line - high Variance"),col=c("orange","blue"),lty=c(1,1))
Here we increased the noise by increasing the variance of error term and keeping the equation same as before. We see the R- square has decreased and plot suggests relationship is less linear.
j What are the confidence intervals for β0 and β1 based on the original data set, the noisier data set, and the less noisy data set? Comment on your results.
The confidence intervals for coefficients for dataset with less noise is given by -1.07, 0.43, -0.95, 0.57
The confidence intervals for coefficients for original dataset is given by -1.12, 0.39, -0.92, 0.61.
The confidence intervals for coefficients for dataset with more noise is given by -1.46, 0.07, -0.69, 0.93.
We get that with increase in noise the confidence intervals gets wider.
This problem focuses on the collinearity problem.
a Perform the following commands in R: set.seed(1)
x1=runif (100)
x2=0.5*x1+rnorm (100)/10
y=2+2x1+0.3x2+rnorm (100) The last line corresponds to creating a linear model in which y is a function of x1 and x2. Write out the form of the linear model. What are the regression coefficients?
set.seed(1)
x1=runif (100)
x2=0.5*x1+rnorm (100)/10
y=2+2*x1+0.3*x2+rnorm (100)
The form of the model is . The regression coefficient are 2, 2, 0.3
b What is the correlation between x1 and x2? Create a scatterplot displaying the relationship between the variables
cor(x1,x2)
## [1] 0.8351212
plot(x1,x2)
The correlation between x1 and x2 is 0.8351212 . The plot shows linear relationship between x1 and x2.
c Using this data, fit a least squares regression to predict y using x1 and x2. Describe the results obtained. What are β0^, β1^, and β2^? How do these relate to the true β0, β1, and β2? Can you reject the null hypothesis H0:β1=0? How about the null hypothesis H0:β2=0?
coll<-lm(y~x1+x2)
summary(coll)
##
## Call:
## lm(formula = y ~ x1 + x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8311 -0.7273 -0.0537 0.6338 2.3359
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.1305 0.2319 9.188 7.61e-15 ***
## x1 1.4396 0.7212 1.996 0.0487 *
## x2 1.0097 1.1337 0.891 0.3754
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.056 on 97 degrees of freedom
## Multiple R-squared: 0.2088, Adjusted R-squared: 0.1925
## F-statistic: 12.8 on 2 and 97 DF, p-value: 1.164e-05
The B0, B1, and B2 are 2.1305, 1.4396, and 1.0097. These coefficients are away from the regression coefficient of 2, 2, 0.3.The null hypothesis can be rejected for intercept and x1 but cannot be rejected for x2 based on the p-values.We can reject the null hypothesis only if p-value is less than 0.05 at alpha=5%.
d Now fit a least squares regression to predict y using only x1. Comment on your results. Can you reject the null hypothesis H0:β1=0?
collx1<-lm(y~x1)
summary(collx1)
##
## Call:
## lm(formula = y ~ x1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.89495 -0.66874 -0.07785 0.59221 2.45560
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.1124 0.2307 9.155 8.27e-15 ***
## x1 1.9759 0.3963 4.986 2.66e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.055 on 98 degrees of freedom
## Multiple R-squared: 0.2024, Adjusted R-squared: 0.1942
## F-statistic: 24.86 on 1 and 98 DF, p-value: 2.661e-06
Based on the p-value (<0.05) we can reject the null hypothesis of B1 =0
e Now fit a least squares regression to predict y using only x2. Comment on your results. Can you reject the null hypothesis H0:β1=0?
collx2<-lm(y~x2)
summary(collx2)
##
## Call:
## lm(formula = y ~ x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.62687 -0.75156 -0.03598 0.72383 2.44890
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.3899 0.1949 12.26 < 2e-16 ***
## x2 2.8996 0.6330 4.58 1.37e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.072 on 98 degrees of freedom
## Multiple R-squared: 0.1763, Adjusted R-squared: 0.1679
## F-statistic: 20.98 on 1 and 98 DF, p-value: 1.366e-05
Based on the p-value (<0.05) we can reject the null hypothesis of B1 =0
f Do the results obtained in (c)-(e) contradict each other? Explain your answer In c we obtained that x2 is insignificant as we could not reject the null hypothesis while in e we could reject the null hypothesis and declare x2 is statistically significant. This is happening due to collinearity between x1 and x2. The effect of x2 is masked because of x1 when we use x1 and x2 both in the model. Due to presence of collinearity we fail to reject the null hypothesis and thereby increase Type 1 error.
g Now suppose we obtain one additional observation, which was unfortunately mismeasured.
x1<-c(x1,0.1)
x2<-c(x2,0.8)
y=c(y,6)
model1<-lm(y~x1+x2)
summary(model1)
##
## Call:
## lm(formula = y ~ x1 + x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.73348 -0.69318 -0.05263 0.66385 2.30619
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.2267 0.2314 9.624 7.91e-16 ***
## x1 0.5394 0.5922 0.911 0.36458
## x2 2.5146 0.8977 2.801 0.00614 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.075 on 98 degrees of freedom
## Multiple R-squared: 0.2188, Adjusted R-squared: 0.2029
## F-statistic: 13.72 on 2 and 98 DF, p-value: 5.564e-06
par(mfrow=c(2,2))
plot(model1)
The last point (index 101) is highlighted in cook’s distance plot which shows that its a high leverage point.
x1<-c(x1,0.1)
x2<-c(x2,0.8)
y=c(y,6)
model2<-lm(y~x1)
summary(model2)
##
## Call:
## lm(formula = y ~ x1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8848 -0.6542 -0.0769 0.6137 3.4510
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.3921 0.2454 9.747 3.55e-16 ***
## x1 1.5691 0.4255 3.687 0.000369 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.16 on 100 degrees of freedom
## Multiple R-squared: 0.1197, Adjusted R-squared: 0.1109
## F-statistic: 13.6 on 1 and 100 DF, p-value: 0.0003686
par(mfrow=c(2,2))
plot(model2)
The last point (index 101) is highlighted in residual vs fitted value as well as cook’s distance plot which shows that its an outlier as well as a high leverage point.
x1<-c(x1,0.1)
x2<-c(x2,0.8)
y=c(y,6)
model3<-lm(y~x2)
summary(model3)
##
## Call:
## lm(formula = y ~ x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.67781 -0.66511 -0.00773 0.79746 2.27887
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.2781 0.1850 12.313 < 2e-16 ***
## x2 3.4471 0.5561 6.199 1.25e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.073 on 101 degrees of freedom
## Multiple R-squared: 0.2756, Adjusted R-squared: 0.2684
## F-statistic: 38.43 on 1 and 101 DF, p-value: 1.249e-08
par(mfrow=c(2,2))
plot(model3)
The last point (index 101) is highlighted in cook’s distance plot which shows that its a high leverage point.
This problem involves the data set, which we saw in the lab for this chapter. We will now try to predict per capita crime rate using the other variables in this data set. In other words, per capita crime rate is the response, and the other variables are the predictors.
a For each predictor, fit a simple linear regression model to predict the response. Describe your results. In which of the models is there a statistically significant association between the predictor and the response? Create some plots to back up your assertions.
boston.zn<-lm(crim~zn,data=Boston)
summary(boston.zn)
##
## Call:
## lm(formula = crim ~ zn, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.429 -4.222 -2.620 1.250 84.523
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.45369 0.41722 10.675 < 2e-16 ***
## zn -0.07393 0.01609 -4.594 5.51e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.435 on 504 degrees of freedom
## Multiple R-squared: 0.04019, Adjusted R-squared: 0.03828
## F-statistic: 21.1 on 1 and 504 DF, p-value: 5.506e-06
boston.indus<-lm(crim~indus,data=Boston)
summary(boston.indus)
##
## Call:
## lm(formula = crim ~ indus, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.972 -2.698 -0.736 0.712 81.813
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.06374 0.66723 -3.093 0.00209 **
## indus 0.50978 0.05102 9.991 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.866 on 504 degrees of freedom
## Multiple R-squared: 0.1653, Adjusted R-squared: 0.1637
## F-statistic: 99.82 on 1 and 504 DF, p-value: < 2.2e-16
boston.chas<-lm(crim~chas,data=Boston)
summary(boston.chas)
##
## Call:
## lm(formula = crim ~ chas, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.738 -3.661 -3.435 0.018 85.232
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.7444 0.3961 9.453 <2e-16 ***
## chas -1.8928 1.5061 -1.257 0.209
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.597 on 504 degrees of freedom
## Multiple R-squared: 0.003124, Adjusted R-squared: 0.001146
## F-statistic: 1.579 on 1 and 504 DF, p-value: 0.2094
boston.nox<-lm(crim~nox,data=Boston)
summary(boston.nox)
##
## Call:
## lm(formula = crim ~ nox, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.371 -2.738 -0.974 0.559 81.728
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -13.720 1.699 -8.073 5.08e-15 ***
## nox 31.249 2.999 10.419 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.81 on 504 degrees of freedom
## Multiple R-squared: 0.1772, Adjusted R-squared: 0.1756
## F-statistic: 108.6 on 1 and 504 DF, p-value: < 2.2e-16
boston.rm<-lm(crim~rm,data=Boston)
summary(boston.rm)
##
## Call:
## lm(formula = crim ~ rm, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.604 -3.952 -2.654 0.989 87.197
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.482 3.365 6.088 2.27e-09 ***
## rm -2.684 0.532 -5.045 6.35e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.401 on 504 degrees of freedom
## Multiple R-squared: 0.04807, Adjusted R-squared: 0.04618
## F-statistic: 25.45 on 1 and 504 DF, p-value: 6.347e-07
boston.age<-lm(crim~age,data=Boston)
summary(boston.age)
##
## Call:
## lm(formula = crim ~ age, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.789 -4.257 -1.230 1.527 82.849
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.77791 0.94398 -4.002 7.22e-05 ***
## age 0.10779 0.01274 8.463 2.85e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.057 on 504 degrees of freedom
## Multiple R-squared: 0.1244, Adjusted R-squared: 0.1227
## F-statistic: 71.62 on 1 and 504 DF, p-value: 2.855e-16
boston.dis<-lm(crim~dis,data=Boston)
summary(boston.dis)
##
## Call:
## lm(formula = crim ~ dis, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.708 -4.134 -1.527 1.516 81.674
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.4993 0.7304 13.006 <2e-16 ***
## dis -1.5509 0.1683 -9.213 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.965 on 504 degrees of freedom
## Multiple R-squared: 0.1441, Adjusted R-squared: 0.1425
## F-statistic: 84.89 on 1 and 504 DF, p-value: < 2.2e-16
boston.rad<-lm(crim~rad,data=Boston)
summary(boston.rad)
##
## Call:
## lm(formula = crim ~ rad, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.164 -1.381 -0.141 0.660 76.433
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.28716 0.44348 -5.157 3.61e-07 ***
## rad 0.61791 0.03433 17.998 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.718 on 504 degrees of freedom
## Multiple R-squared: 0.3913, Adjusted R-squared: 0.39
## F-statistic: 323.9 on 1 and 504 DF, p-value: < 2.2e-16
boston.tax<-lm(crim~tax,data=Boston)
summary(boston.tax)
##
## Call:
## lm(formula = crim ~ tax, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.513 -2.738 -0.194 1.065 77.696
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.528369 0.815809 -10.45 <2e-16 ***
## tax 0.029742 0.001847 16.10 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.997 on 504 degrees of freedom
## Multiple R-squared: 0.3396, Adjusted R-squared: 0.3383
## F-statistic: 259.2 on 1 and 504 DF, p-value: < 2.2e-16
boston.ptratio<-lm(crim~ptratio,data=Boston)
summary(boston.ptratio)
##
## Call:
## lm(formula = crim ~ ptratio, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.654 -3.985 -1.912 1.825 83.353
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.6469 3.1473 -5.607 3.40e-08 ***
## ptratio 1.1520 0.1694 6.801 2.94e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.24 on 504 degrees of freedom
## Multiple R-squared: 0.08407, Adjusted R-squared: 0.08225
## F-statistic: 46.26 on 1 and 504 DF, p-value: 2.943e-11
boston.black<-lm(crim~black,data=Boston)
summary(boston.black)
##
## Call:
## lm(formula = crim ~ black, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.756 -2.299 -2.095 -1.296 86.822
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.553529 1.425903 11.609 <2e-16 ***
## black -0.036280 0.003873 -9.367 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.946 on 504 degrees of freedom
## Multiple R-squared: 0.1483, Adjusted R-squared: 0.1466
## F-statistic: 87.74 on 1 and 504 DF, p-value: < 2.2e-16
boston.lstat<-lm(crim~lstat,data=Boston)
summary(boston.lstat)
##
## Call:
## lm(formula = crim ~ lstat, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.925 -2.822 -0.664 1.079 82.862
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.33054 0.69376 -4.801 2.09e-06 ***
## lstat 0.54880 0.04776 11.491 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.664 on 504 degrees of freedom
## Multiple R-squared: 0.2076, Adjusted R-squared: 0.206
## F-statistic: 132 on 1 and 504 DF, p-value: < 2.2e-16
boston.medv<-lm(crim~medv,data=Boston)
summary(boston.medv)
##
## Call:
## lm(formula = crim ~ medv, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.071 -4.022 -2.343 1.298 80.957
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.79654 0.93419 12.63 <2e-16 ***
## medv -0.36316 0.03839 -9.46 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.934 on 504 degrees of freedom
## Multiple R-squared: 0.1508, Adjusted R-squared: 0.1491
## F-statistic: 89.49 on 1 and 504 DF, p-value: < 2.2e-16
Above models show that only chas variable is not significant in predicting the per capita crime rate. Based on the p-value of its t statistic we cannot reject the null hypothesis. For every other variable the p-value is too small and we can reject the null hypothesis and conclude that there is statistical significant relationship between predictor and response.
b Fit a multiple regression model to predict the response using all of the predictors. Describe your results. For which predictors can we reject the null hypothesis H0:βj=0
boston.all<-lm(crim~.,Boston)
summary(boston.all)
##
## Call:
## lm(formula = crim ~ ., data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.924 -2.120 -0.353 1.019 75.051
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.033228 7.234903 2.354 0.018949 *
## zn 0.044855 0.018734 2.394 0.017025 *
## indus -0.063855 0.083407 -0.766 0.444294
## chas -0.749134 1.180147 -0.635 0.525867
## nox -10.313535 5.275536 -1.955 0.051152 .
## rm 0.430131 0.612830 0.702 0.483089
## age 0.001452 0.017925 0.081 0.935488
## dis -0.987176 0.281817 -3.503 0.000502 ***
## rad 0.588209 0.088049 6.680 6.46e-11 ***
## tax -0.003780 0.005156 -0.733 0.463793
## ptratio -0.271081 0.186450 -1.454 0.146611
## black -0.007538 0.003673 -2.052 0.040702 *
## lstat 0.126211 0.075725 1.667 0.096208 .
## medv -0.198887 0.060516 -3.287 0.001087 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.439 on 492 degrees of freedom
## Multiple R-squared: 0.454, Adjusted R-squared: 0.4396
## F-statistic: 31.47 on 13 and 492 DF, p-value: < 2.2e-16
From the summary we can say that the null hypothesis can be rejected for variables ** zn, dis, rad, black, medv** as their p-value is less than 0.05.
c How do your results from (a) compare to your results from (b)? Create a plot displaying the univariate regression coefficients from (a) on the x-axis, and the multiple regression coefficients from (b) on the y-axis. That is, each predictor is displayed as a single point in the plot. Its coefficient in a simple linear regression model is shown on the x-axis, and its coefficient estimate in the multiple linear regression model is shown on the y-axis
simple<-vector("numeric",0)
simple<-c(simple,boston.zn$coefficients[2])
simple<-c(simple,boston.indus$coefficients[2])
simple<-c(simple,boston.chas$coefficients[2])
simple<-c(simple,boston.nox$coefficients[2])
simple<-c(simple,boston.rm$coefficients[2])
simple<-c(simple,boston.age$coefficients[2])
simple<-c(simple,boston.dis$coefficients[2])
simple<-c(simple,boston.rad$coefficients[2])
simple<-c(simple,boston.tax$coefficients[2])
simple<-c(simple,boston.ptratio$coefficients[2])
simple<-c(simple,boston.black$coefficients[2])
simple<-c(simple,boston.lstat$coefficients[2])
simple<-c(simple,boston.medv$coefficients[2])
multi<-vector("numeric",0)
multi<-c(multi,boston.all$coefficients)
multi<-multi[-1]
plot(simple,multi,col='blue')
It can be seen from the plot that the values for coefficient for variable is different when modelled alone compared to model having all together.
d Is there evidence of non-linear association between any of the predictors and the response? To answer this question, for each predictor X, fit a model of the form: Y=β0+β1X+β2X2+β3X3+ϵ
boston.zn1<-lm(crim~poly(zn,3),data=Boston)
summary(boston.zn1)
##
## Call:
## lm(formula = crim ~ poly(zn, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.821 -4.614 -1.294 0.473 84.130
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3722 9.709 < 2e-16 ***
## poly(zn, 3)1 -38.7498 8.3722 -4.628 4.7e-06 ***
## poly(zn, 3)2 23.9398 8.3722 2.859 0.00442 **
## poly(zn, 3)3 -10.0719 8.3722 -1.203 0.22954
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.372 on 502 degrees of freedom
## Multiple R-squared: 0.05824, Adjusted R-squared: 0.05261
## F-statistic: 10.35 on 3 and 502 DF, p-value: 1.281e-06
par(mfrow=c(2,2))
plot(boston.zn1)
boston.indus1<-lm(crim~poly(indus,3),data=Boston)
summary(boston.indus1)
##
## Call:
## lm(formula = crim ~ poly(indus, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.278 -2.514 0.054 0.764 79.713
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.614 0.330 10.950 < 2e-16 ***
## poly(indus, 3)1 78.591 7.423 10.587 < 2e-16 ***
## poly(indus, 3)2 -24.395 7.423 -3.286 0.00109 **
## poly(indus, 3)3 -54.130 7.423 -7.292 1.2e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.423 on 502 degrees of freedom
## Multiple R-squared: 0.2597, Adjusted R-squared: 0.2552
## F-statistic: 58.69 on 3 and 502 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(boston.indus1)
boston.nox1<-lm(crim~poly(nox,3),data=Boston)
summary(boston.nox1)
##
## Call:
## lm(formula = crim ~ poly(nox, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.110 -2.068 -0.255 0.739 78.302
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3216 11.237 < 2e-16 ***
## poly(nox, 3)1 81.3720 7.2336 11.249 < 2e-16 ***
## poly(nox, 3)2 -28.8286 7.2336 -3.985 7.74e-05 ***
## poly(nox, 3)3 -60.3619 7.2336 -8.345 6.96e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.234 on 502 degrees of freedom
## Multiple R-squared: 0.297, Adjusted R-squared: 0.2928
## F-statistic: 70.69 on 3 and 502 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(boston.nox1)
boston.rm1<-lm(crim~poly(rm,3),data=Boston)
summary(boston.rm1)
##
## Call:
## lm(formula = crim ~ poly(rm, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.485 -3.468 -2.221 -0.015 87.219
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3703 9.758 < 2e-16 ***
## poly(rm, 3)1 -42.3794 8.3297 -5.088 5.13e-07 ***
## poly(rm, 3)2 26.5768 8.3297 3.191 0.00151 **
## poly(rm, 3)3 -5.5103 8.3297 -0.662 0.50858
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.33 on 502 degrees of freedom
## Multiple R-squared: 0.06779, Adjusted R-squared: 0.06222
## F-statistic: 12.17 on 3 and 502 DF, p-value: 1.067e-07
par(mfrow=c(2,2))
plot(boston.rm1)
boston.age1<-lm(crim~poly(age,3),data=Boston)
summary(boston.age1)
##
## Call:
## lm(formula = crim ~ poly(age, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.762 -2.673 -0.516 0.019 82.842
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3485 10.368 < 2e-16 ***
## poly(age, 3)1 68.1820 7.8397 8.697 < 2e-16 ***
## poly(age, 3)2 37.4845 7.8397 4.781 2.29e-06 ***
## poly(age, 3)3 21.3532 7.8397 2.724 0.00668 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.84 on 502 degrees of freedom
## Multiple R-squared: 0.1742, Adjusted R-squared: 0.1693
## F-statistic: 35.31 on 3 and 502 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(boston.age1)
boston.dis1<-lm(crim~poly(dis,3),data=Boston)
summary(boston.dis1)
##
## Call:
## lm(formula = crim ~ poly(dis, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.757 -2.588 0.031 1.267 76.378
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3259 11.087 < 2e-16 ***
## poly(dis, 3)1 -73.3886 7.3315 -10.010 < 2e-16 ***
## poly(dis, 3)2 56.3730 7.3315 7.689 7.87e-14 ***
## poly(dis, 3)3 -42.6219 7.3315 -5.814 1.09e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.331 on 502 degrees of freedom
## Multiple R-squared: 0.2778, Adjusted R-squared: 0.2735
## F-statistic: 64.37 on 3 and 502 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(boston.dis1)
boston.rad1<-lm(crim~poly(rad,3),data=Boston)
summary(boston.rad1)
##
## Call:
## lm(formula = crim ~ poly(rad, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.381 -0.412 -0.269 0.179 76.217
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.2971 12.164 < 2e-16 ***
## poly(rad, 3)1 120.9074 6.6824 18.093 < 2e-16 ***
## poly(rad, 3)2 17.4923 6.6824 2.618 0.00912 **
## poly(rad, 3)3 4.6985 6.6824 0.703 0.48231
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.682 on 502 degrees of freedom
## Multiple R-squared: 0.4, Adjusted R-squared: 0.3965
## F-statistic: 111.6 on 3 and 502 DF, p-value: < 2.2e-16
plot(boston.rad1)
boston.tax1<-lm(crim~poly(tax,3),data=Boston)
summary(boston.tax1)
##
## Call:
## lm(formula = crim ~ poly(tax, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.273 -1.389 0.046 0.536 76.950
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3047 11.860 < 2e-16 ***
## poly(tax, 3)1 112.6458 6.8537 16.436 < 2e-16 ***
## poly(tax, 3)2 32.0873 6.8537 4.682 3.67e-06 ***
## poly(tax, 3)3 -7.9968 6.8537 -1.167 0.244
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.854 on 502 degrees of freedom
## Multiple R-squared: 0.3689, Adjusted R-squared: 0.3651
## F-statistic: 97.8 on 3 and 502 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(boston.tax1)
boston.ptratio1<-lm(crim~poly(ptratio,3),data=Boston)
summary(boston.ptratio1)
##
## Call:
## lm(formula = crim ~ poly(ptratio, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.833 -4.146 -1.655 1.408 82.697
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.614 0.361 10.008 < 2e-16 ***
## poly(ptratio, 3)1 56.045 8.122 6.901 1.57e-11 ***
## poly(ptratio, 3)2 24.775 8.122 3.050 0.00241 **
## poly(ptratio, 3)3 -22.280 8.122 -2.743 0.00630 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.122 on 502 degrees of freedom
## Multiple R-squared: 0.1138, Adjusted R-squared: 0.1085
## F-statistic: 21.48 on 3 and 502 DF, p-value: 4.171e-13
plot(boston.ptratio1)
boston.black1<-lm(crim~poly(black,3),data=Boston)
summary(boston.black1)
##
## Call:
## lm(formula = crim ~ poly(black, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.096 -2.343 -2.128 -1.439 86.790
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3536 10.218 <2e-16 ***
## poly(black, 3)1 -74.4312 7.9546 -9.357 <2e-16 ***
## poly(black, 3)2 5.9264 7.9546 0.745 0.457
## poly(black, 3)3 -4.8346 7.9546 -0.608 0.544
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.955 on 502 degrees of freedom
## Multiple R-squared: 0.1498, Adjusted R-squared: 0.1448
## F-statistic: 29.49 on 3 and 502 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(boston.black1)
boston.lstat1<-lm(crim~poly(lstat,3),data=Boston)
summary(boston.lstat1)
##
## Call:
## lm(formula = crim ~ poly(lstat, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.234 -2.151 -0.486 0.066 83.353
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3392 10.654 <2e-16 ***
## poly(lstat, 3)1 88.0697 7.6294 11.543 <2e-16 ***
## poly(lstat, 3)2 15.8882 7.6294 2.082 0.0378 *
## poly(lstat, 3)3 -11.5740 7.6294 -1.517 0.1299
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.629 on 502 degrees of freedom
## Multiple R-squared: 0.2179, Adjusted R-squared: 0.2133
## F-statistic: 46.63 on 3 and 502 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(boston.lstat1)
boston.medv1<-lm(crim~poly(medv,3),data=Boston)
summary(boston.medv1)
##
## Call:
## lm(formula = crim ~ poly(medv, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.427 -1.976 -0.437 0.439 73.655
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.614 0.292 12.374 < 2e-16 ***
## poly(medv, 3)1 -75.058 6.569 -11.426 < 2e-16 ***
## poly(medv, 3)2 88.086 6.569 13.409 < 2e-16 ***
## poly(medv, 3)3 -48.033 6.569 -7.312 1.05e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.569 on 502 degrees of freedom
## Multiple R-squared: 0.4202, Adjusted R-squared: 0.4167
## F-statistic: 121.3 on 3 and 502 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(boston.medv1)
From the summary of each model it is clear that cubic relationship between predictor and response is significant for ** indus, nox, age, dis, ptratio, and medv** variables indicating non linear relationship. For black variable neither cubic nor quadratic coefficient is significant suggesting no non linear relationship visible.